MSQ-Index: A Succinct Index for Fast Graph Similarity Search

نویسندگان

چکیده

Graph similarity search under the graph edit distance constraint has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition and social networks. Existing methods for this problem have limited scalability because of huge amount memory they consume when handling very large databases with tens millions graphs. In article, we present a succinct index that incorporates structures hybrid encoding to achieve improved query time performance minimal space usage. Specifically, usage our requires only 5-15 percent previous state-of-the-art indexing size while at same achieving several times acceleration on tested data. We also improve by augmenting global filter range searching, which allows us perform reduced region. addition, propose two effective lower bounds together boosting technique obtain smallest possible candidate set. Extensive experiments demonstrate proposed approach is superior both filtering approaches. To best knowledge, first in-memory successfully scales cope dataset 25 million chemical structure graphs from PubChem dataset. The source code available online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MSQ-Index: A Succinct Index for Fast Graph Similarity Search

Graph similarity search has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition, and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with millions or billions of graphs. In this paper, we study the problem of graph simi...

متن کامل

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...

متن کامل

Ashwini Index of a ‎Graph

Motivated by the terminal Wiener index‎, ‎we define the Ashwini index $mathcal{A}$ of trees as‎ begin{eqnarray*}‎ % ‎nonumber to remove numbering (before each equation)‎ ‎mathcal{A}(T) &=& sumlimits_{1leq i

متن کامل

Bounds for the Co-PI index of a graph

In this paper, we present some inequalities for the Co-PI index involving the some topological indices, the number of vertices and edges, and the maximum degree. After that, we give a result for trees. In addition, we give some inequalities for the largest eigenvalue of the Co-PI matrix of G.

متن کامل

Index-Supported Similarity Search Using Multiple Representations

Similarity search in complex databases is of utmost interest in a wide range of application domains. Often, complex objects are described by several representations. The combination of these different representations usually contains more information compared to only one representation. In our work, we introduce the use of an index structure in combination with a negotiation-theorybased approac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2021

ISSN: ['1558-2191', '1041-4347', '2326-3865']

DOI: https://doi.org/10.1109/tkde.2019.2954527